Deploying Multiple Storage Nodes

OverviewWhile RHQ fully supports running a single storage node, clustering is at the core of the storage node architecture. This is a guide for (un)deploying multiple storage nodes. Just like the relational database, the storage node is a prerequisite for installing the RHQ server. Additional storage nodes can be deployed after the server is installed as well as before.

Deploying Storage Nodes Before Server Installation

Deploying multiple storage nodes prior to installing the server is an advanced set up. It requires manual configuration that is not necessary when deploying storage nodes after already having the server installed.

This guide assumes each RHQ storage node and RHQ server will run on their own machines. You will be performing a non-default installation of each storage node. Please review RHQ Storage Node Installation and Storage Properties, if you have not already done so.

The steps below describe setting up a three node cluster. The steps will be the same for a different number of nodes.

Step 1. Determine storage node addresses

Determine the IP addresses of the machines on which storage nodes will be running.

Step 2. Determine cluster configuration settings

The ports used for client request and for internode communication (i.e., gossip), must be the same for each storage node. The corresponding properties that can appear in rhq-storage.properties are listed below along with their default values.

Some storage node settings are shared cluster-wide and must be the same for each node. The following properties (that can appear in rhq-storage.properties) are cluster-wide settings:

Property	Default Value
rhq.storage.cql-port	9142
rhq.storage.gossip-port	7100

See Cluster Settings for more information are shared, cluster settings.

Step 3. Edit rhq-storage.properties

Edit the <rhq-install-dir>/bin/rhq-storage.properties file with these properties:

rhq.storage.cql-port=<your-cql-port>
rhq.storage.gossip-port=<your-gossip-port>
rhq.storage.seeds=address-1,address-2,address-3
start=false

where address-1,address-2,address-3 are the IP addresses of the machines on which storage nodes will be installed.

Step 4. Install Storage Nodes

rhqctl install --storage --agent-preference="rhq.agent.server.bind-address=<rhq-server-address>"

Execute this for each storage node. This will install the storage node as well as the agent. For the agent, we minimally have to specify the server address. If the server uses a non-default port, we would also need to specify it with the rhq.agent.server.bind-port agent preference.

Note that the storage node is not started. One additional configuration change needs to be made.

Step 5. Configure Internode Authentication

Open <rhq-server-dir>/rhq-storage/conf/rhq-storage-auth.conf in a text editor. Change the contents of the file to be

address-1
address-2
address-3

Each storage node address should be on its own line. Order is not important. See RHQ Storage Node Internode Authentication for details.

The storage node needs to be restarted in order for changes to rhq-storage-auth.conf to take effect.

Step 6. Start the Storage Nodes

rhqctl start

Step 7. Configure the Server for Installation

On the machine where you will install the RHQ Server, you will have to update <rhq-install-dir>/bin/rhq-server.properties with connection information for the storage nodes. You need to edit the following properties,

rhq.storage.nodes
rhq.storage.cql-port
rhq.storage.gossip-port

rhq.storage.nodes should contain a comma-delimited list of the storage node IP addresses.

rhq.storage.nodes must contain the address of each storage node that has been deployed prior to the server installation.

rhq.storage.cql-port corresponds to client-port in rhq-storage.properties.

rhq.storage.gossip-port corresponds to storage-port in rhq-storage.properties.

Remember that the corresponding properties for rhq.storage.cql-port and for rhq.storage.gossip-port must be the same for every storage node.

After you finish editing rhq-server.properties, run rhqctl.

rhqctl install --server --agent

This will install the server and the agent but no storage node.

Upgrading

If you are performing an upgrade instead of a new install, then run

rhqctl upgrade --use-remote-storage-node=true

By default rhqctl will install a storage node. The use-remote-storage-node option will stop rhqctl from installing a storage node along side the server.

Deploying Nodes After Server Installation

Deploying nodes after sever installation is a much simpler process for the user because the server in coordination with the agent makes all the necessary changes so that the new node can join the cluster. You find detailed information about the process in the Deploying Storage Nodes design document.

Let's assume that we have a default installation of server, storage node, and agent all running on the same machine. We want to install a second storage node on another machine.

Step 1. Install the storage node

Run rhqctl on the machine on which the new storage node will be installed.

rhqctl install --storage --agent-preference="rhq.agent.server.bind-address=<rhq-server-address>"

By default storage nodes are automatically deployed into the cluster upon installation.

See Cluster Settings for information about disabling automatic deployment.

Step 2. Check Deployment Progress

Although simpler, deploying additional nodes after the server installation is a lengthy process. You can monitor the deployment in the storage nodes admin UI.

images/author/download/attachments/68452728/deployment_in_progress.png

Notice the Cluster Status column. The last node (10.16.23.185) has a status of JOINING. This indicates that the deployment is in progress. Click on the address link in the Endpoint Address column to go to the details view. There you can view the Operation Mode which shows the current step of the deployment process. This is illustrated in the next screenshot.

images/author/download/attachments/68452728/deployment_joining.png

Notice that the Operation Mode field has a value of BOOTSTRAP. Review Deploying Storage Nodes for what the BOOTSTRAP mode entails.

Step 3. Handling Deployment Errors

As with any other distributed system, failures can and will happen with the storage cluster. Failures can occur during the deployment process.

images/author/download/attachments/68452728/failed_deployment.png

Here is a screenshot of a failed deployment. The node with the address 10.16.23.188 has a cluster status of DOWN. Click on the Endpoint Address column to find additional information about the failure. Notice that 10.16.23.170 is reporting DOWN for its availability. It may have crashed, causing the deployment to fail. To resolve the issue, we restart 10.16.23.170 after addressing any errors (like an OutOfMemoryError) that caused the crash and then simply redeploy 10.16.23.188. The deployment process will resume where it left off.

Step 4. Verify the Deployment

The deployment is finished and successful when the new node, along with the other cluster nodes, report NORMAL for their cluster statuses. While a deployment is ongoing or if auto deployment is disabled, one or more nodes can have cluster status of INSTALLED. A node that is INSTALLED is not yet part of the cluster. It is important to be aware of this distinction when determining whether or not a deployment has completed.

Undeploying Storage Nodes

Undeploying a storage node is done through the storage node admin UI. Like deployment, it is a multi-step process. The following actions are performed:

The storage node is decommissioned (stops participating in the cluster)
The storage node is shut down
The storage node bits are purged from disk
The storage node resource is removed from inventory

For detailed information about the process, see Deploying Storage Nodes.

Similar to deployment, this process can also fail. The cluster status of the node will change to DOWN. Once the reported issues have been addressed, resume the undeployment.

Do not uninventory the storage node resource or any of its child resources. Doing so could compromise the stability of the storage cluster. The storage node resource hierarchy will be removed from inventory as part of the undeployment process.

(Un)Deployment from the CLI

Storage nodes can easily be (un)deployed from the RHQ CLI.

// deploy a storage node
nodes = StorageNodeManager.findStorageNodesByCriteria(StorageNodeCriteria());
node = nodes.get(0);
StorageNodeManager.deployStorageNode(node);

// undeploy a storage node
nodes = StorageNodeManager.findStorageNodesByCriteria(StorageNodeCriteria());
node = nodes.get(0);
StorageNodeManager.undeployStorageNode(node);